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Abstract. A complete solution for an inverse problem needs five main steps: choice 
of basis functions for discretization, determination of the order of the model, estimation 
of the hyperparameters, estimation of the solution, and finally, caracterisation of the 
proposed solution. Many works have been done for the three last steps. The two first 
have been neglected for a while, in part due to the complexity of the problem. However, 
in many inverse problems, particularly when the number of data is very low, a good 
choice of the basis functions and a good selection of the order become primordial. In 
this paper, we first propose a complete solution whithin a Bayesian framework. Then, 
we apply the proposed method to an inverse elastic electron scattering problem. 



1. Introduction 

In a very general linear inverse problem, the relation between the data y 
[yi 7 • ' ' j VmY and the unknown function /(.) is 



II; 



J hi{r) f(r)dr, i=l,---,m, (1) 



where hi(r) is the system response for the data yt. We assume here that hi(r) 
are known perfectly. The first step for any numerical processing is the choice of a 
basis function bj(r) and an order k, in such a way to be able to write 

k 

f(r) = J2^b j (r). (2) 

This leads to 

y = Ax + e (3) 
with y = [yi, ■ ■ • ,y m ]*, x = [x\, ■ ■ • ,x fe ]' and 



J h t {r)bj{r)dr, i = 1, • • • , m, j = 1, ■ ■ ■ , k (4) 



Presented at the 19th Int. worskhop on Bayesian and Maximum Entropy methods (MaxEnt 
1999), Aug. 2-6, 1999, Boise, Idaho, USA 



where e — [ei, • • • , e m ]* represents the errors (both the measurement noise and the 
modeling and the approximation related to the numerical computation of matrix 
elements Aij). Even, when the choice of the basis functions h(r) and the model 
order k is fixed, obtaining a good estimate for x needs other assumptions about 
the noise e and about x itself. The Bayesian approach provides a coherent and 
complete framework to handle the random nature of e and the a priori incomplete 
knowledge of x. 

The first step in a Bayesian approach is to assign the prior probability laws 
p(y | x, <fr, k, I) — p e {y — Ax \ <p, k, I), p(x \ 0, k, I), p(4> \ k, I) and p(ip \k,l), where 
p e (y — Ax\<p, k, I) is the probability law of the noise, and (</>, if)) the hyperparame- 
ters of the problem. Note that x represents the unknown parameters, k = dim(cc) 
is the order of the model, m = dim(y) is the number of the data and I is an index 
to a particular choice of basis functions. Note that the elements of the matrix A 
depend on the choice of the basis functions. However, to simplify the notations, 
we do not write this dependance explicitely. We assume that we have to select 
one set I of basis functions among a finite set (indexed by [1 : l ma x\) of them. 
Thus, for a given I £ [1, l max ] and a given model order k € [1, k max ], and using the 
mentionned prior laws, we define the joint probability law 

p(y, x,(j>,ij>\ k, I) = p(y | x, 0, k, I) p(x \ i/>, k, I) p(<f> \ k, I) p(ip \k,l). (5) 

From this probability law, we obtain, either by integration or by summation, any 
marginal law, and any a posteriori probability law using the Bayes rule. 
What we propose in this paper is to consider the following problems: 

— Parameter estimation: 




x 



(6) 



where 



p(x | y, 0, tp, k, I) = p(y, x\4>,ip, k, I) / p(y | 0, -0, k, I) 
p(y, x\<p,ip, k, I) = p(y | x, 4>, k, l)p{x | ip, k, I) 



(7) 
(8) 



and 





— Hyperparameter estimation: 




(0,0) 



(10) 



where 



p{4>, tp\y,k,l)= p(y, <f>,ip\k, I) / p(y | k, I) 



(11) 



and 




(12) 



Model order selection: 

k = argmax |p(fc | y, Z)| (13) 



where 



and 



p(k\y,l)=p(y\k,l)p(k)/p(y\l) (14) 

p(y\l)= f^p(y\k,l)p(k). (15) 
fe=i 

— Basis function selection: 

I = argmax {p(l \ y)} (16) 
i 

where 

P(l\y)=p(y\l)p(l)/ P (y) (17) 

and 

'max 

p(v) = X>(vI0p(0- (is) 

i=l 

— Joint parameter, hyperparameter, model order and basis function estimation: 

{x,4>,^,k,l) = argmax {p(y, x, <p, ip \ k, l)p(k)p(l)} . (19) 

As it can be easily seen, the first problem is, in general, a well posed problem 
and the solution can be computed, either analytically or numerically. The others 
(excepted the last) need integrations. These integrals can be done analytically 
only in the case of Gaussian laws. In other cases, one can either use a numerical 
integration (either deterministic or stochastic) or to resort to approximations such 
as the Laplace's method which allows to obtain a closed-form expression for the 
criterion to optimise. 

Here, we consider these problems for the particular case of Gaussian prior laws: 



p(y\x,<f>,k,l) - Ar(Ax,^ = (27r/0)-" l/2 exp 
p(x\^,k,l) = M (o, il) = (2^/^)- k/2 ex 



~4>\\y- Ax\\ 2 



(20) 
(21) 

where 4 and 4- are respectively the variance of the noise and the parameters. 



l4>\\y-Ax\\ 2 -\^\\x\\ 2 



2. Parameter estimation 

First note that in this special case we have 

p(y, x | 0, 4,, k, I) = (2n/^- m/2 (27r/^)- fe/2 exp 

Zi Z 

(22) 

Integration with respect to x can be done analytically and we have: 

p(y | 0, V, fc, = J p{y, x\<t>,i/;, k, I) dx = N (0, P y ) , (23) 

with 

P v = -AA t + -I= -(AA' + XI) and X=%. (24) 

y ip (f> v 

It is then easy to see that the a posteriori law of a; is also Gaussian: 

p(x\y, (f>,ip,k,l) = JV(sc,p) with P = i(A*A + A/)" 1 and 2 = </>PA*y. 

(25) 

Thus the parameter estimation in this case is straightforward: 

x = argmax{p(cc \y,<f),ip,k,l)} — argmin {Ji(x)} , (26) 
x x 

with 

J 1 (x) = \\y-Ax\\ 2 + X\\x\\ 2 , (27) 

which is a quadratic function of x. The solution is then a linear function of the 
data y and is given by 

x = K(X)y with K(X) = (A t A + XI)- 1 A t . (28) 

3. Hyperparameter estimation 

For the hyperparameter estimation problem we note that: 
p{4>,i>\y,k,l) = — | - p(y\cp,i>,k,l) 



p(0\k,l)p(^\k,l) 



(27r)- m / 2 |P,|- 1/2 ex P 



(29) 



p(y I k,i) 

Thus, the hyperparameter estimation problem becomes: 

= argmax I p(4>,ip \y,k,T)\ = argmin {J 2 {<P, ip)} (30) 

where 

J 2 (<f>, V) = - lnpfa | fe, - lnp(V | fc, + \ In |P„| + \ y'P^y. (31) 



Unfortunately, in general, there is not an analytical expression for the solution, 
but this optimization can be done numerically. Many works have been investigated 
to perform this optimization appropriately for particular choices of p(<j> \ k, I) and 
p(tp | k, I). Among the others, we may note the choice of improper prior laws such 
as Jeffry's prior p(<f> \ k, I) oc ^ and p(ip k, I) oc ^ or proper prior laws of uniform 

p((f> | k, I) = and p(if) \ k, I) = ^ \^ or still the proper Gamma prior 

laws. 

One main issue with improper prior laws is the existence of the solution, be- 
cause p((f>, tp\y,k, I) may even not have a maximum or its maximum can be located 
at the border of the domain of variation of ((f), ip). Here, we propose to use the 
following proper Gamma priors : 



>E{^} = a 1 /p 1 (32) 
E{V>} = a 2 //3 2 . (33) 



p(<f>) = ^(ai.ftJoc^-^exphM 
p(if>) - e(a 2 ,0) oc V^-^exph&V] 
With these priors, we have 

J 2 (</>» - {l-a l )\n^+(l- a2 )\n^ + p 1 <P + M + ^\n\P v \ + ^y t p- 1 y. (34) 

The second main issue is the numerical optimization. Many works have been 
done on this subject. Among the others we can mention those who try to integrate 
out one of the two parameters directly or after some transformation. For example 
transforming (<fr, tp) — ► ((f), A) and using the identities 



and 

we have 
and 



\AA* + A/1 = \ m - k \A l A + A/1 



(AA' + Xiy 1 = \(I-AK(\)) 



In IP 



y'Py'y 

Then, we obtain 

J 2 (^V)= (l-ai-^)ln, 
+ i in A*A+p 



-mint/) - k In A + In \A L A + A/| 
y\l - AK(\))y = 0y 4 (y - Ax) = <f>y t (y- y). 



i+(l-a 2 -f)lnV 

+ ly t (y-y)- 



■ M + M> 



or 



J 2 ((j), A) = (2 - ai - a 2 - f ) lnci + (1 - a 2 - §) In A + M + 
+1 In|A*A + AJ| + |y*(i/-y). 



(35) 
(36) 

(37) 
(38) 

(39) 
(40) 



For fixed A, equating to zero the derivative of this expression with respect to (f> 
has an explicite solution which is 



dJ 2 (<f>,\) 







,m 



(— +ai +a 2 -2)/ 



/3i + A/? 2 + iy 4 (y-y) 



(41) 



Putting this expression into J 2 we obtain a criterion depending only on A which 
can be optimized numerically. In addition, it is possible to integrate out <p to 
obtain p(X\y, k, I), but the expression is too complex to write. 



4. Joint estimation 

One may try to estimate all the unknowns simultaneously by 

ix,<p,ip,k,l) = argmax {p(x, <f>, ip, k, l\y} = argmin { J$(x, <j>, ip, k, I)} (42) 

(X,(j>,ip,k,l) (X,c/>,ip,k,l) 

where 

J 3 (x,cp,ip,k,l) = - lnp(fc) - lnp(Z) - (f +«i - l)bx«^ 

-(| + a 2 - 1) InV + <f> (Pi + \\\y- Ax\\ 2 ) + iP(/3 2 + \\\xf) 

(43) 

The main advantage of this criterion is that we obtain explicit solutions for x, <p 
and ip by equating to zero the derivatives of J^{x, (p, ip, k, I) with respect to them: 

x = (A 1 A + \I)~ 1 A t y, with A = (p/ip; 
0=(f +a l -l)/(fh + \\\y-Axf)- (44) 
^ = (|+a 2 -l)/(/3 2 + ip|| 2 ). 

We can not obtain closed form expressions for k and I which depend on the partic- 
ular choice for p(k) and p{l). These relations suggest an iterative algorithm such 
as: 

Joint MAP estimation algorithm 1 

for I 1 . Imax 

for k = 1 : k max 

compute the elements of the matrix A; 
initialize A = Ao ; 
repeat until convergency: 

x = (A 1 A + ^I)~ 1 A t y; 
£= (f + a 1 -l)/(/3 1 + ±\\y-Ax\\ 2 ); 
£=(!+a a -l)/(ft + £||£|| 2 ) 

end 

compute J(k, I) = J%(x, <p, ip, k, I) ; 
end 

end 

choose the best model and the best order by 

(l,k) = argmin (fc {J(k,l)} 

Note however that, for fixed x, <p and ip, the criteria J 3 in Q43|) or J 5 in p7|) 
are mainly linear functions of k if we choose a uniform law for p{k). This means 



that we may not have a minimum for these criteria as a function of k. The choice 
of the prior p(k) is then important. One possible is the following: 



2{k max -k) 



1 < k < k n 



m = i w^-d (45) 

which is a decreasing function of k in the range £ [1, k max ] and zero elsewhere. 
This choice may insure the existence of a minimum if k max is choosed appropriately. 
For p(l) we propose to choose a uniform law, because we do not want to give any 
favor to any model. 

Another algorithm can be obtained if we replace the expression of x into J3 to 
obtain a criterion depending only on {(f), tp) 

Ji(<l),ip,k,l) = -lnp(fc) -\np(l) - (f + c*i - l)ln0- (| +a 2 - l)lni/> 
+0 (A + |||y- y(A)|| 2 ) + V (A + ip(A)|| 2 ) 

(46) 

or on ((j), A): 

J 5 (0, A, k,l)= - lnp(fc) - lnp(i) - (^ + ax + a 2 - 2) In ct> - (f + a 2 - 1) In A 

+0 (0i + Illy - y(A)|| 2 ) + (A^) (/% + |P(A)|| 2 ) 

(47) 

and then optimize it with respect to them. In the second case, we can again obtain 
first <fr and put it's expression 



m + k , . 

— h «i + a 2 - 2 / 



(/3 1 + i||y-y(A)|| 2 ) + Aa3 2 + ip(A)|| 2 ) 

(48) 

in the criterion to obtain another criterion depending only on A and optimize it 
numerically This gives the following algorithm: 



Joint MAP estimation algorithm 2 

for k — 1 : k nlax 

compute the elements of the matrix A; 
for A 6 10[- 8:1:4 1 

compute x = (A A + \I)~ 1 A t y and y = Ax 

compute 4> using (eq. 



compute J(A) = Js((j), X,k,l) (eq. 47 ) 
end 

choose A = argmin A { J(A)} 
compute x = (A 1 A + \I)~ l A t y ; 
compute using (eq.j48|); 
compute J(k,l) = Js((f),\,k,l) (eq. 47) 
end 

end 

choose the best model and the best order by 

(l,k) = argmin (fe0 {J(k,l)} 



5. Model order selection 

The model order selection 



k = argmax {p(k \ y, I)} = argmin { Je(k)} (49) 

k k 

with 

Mk) = -hxp(k)-hip(y\k,l) (50) 
needs one more integration 

p(v I M) = ffp{vA^\h i) d<p dip. (5i) 

or 

P (y \k,l) = JJ p(y, 0, X\k, I) d0dA. (52) 

where p(y, <f>, X\k, I) cx exp [— J 2 (</>, A)] given by (|4C|). As we mentionned in pre- 
ceeding section, these integrations can only be down numerically. A good approx- 
imation can be obtained using the following: 

p(y\k,l) = J J p(y,(j),^\k,l)d^di(} ~ ^^2>(y|<^-,V>;, M) (53) 

i 3 

where {4>j} and {tpi} are samples generated using the prior laws p(4>) and p(ip)- 

6. Best basis or model selection 

The model selection 

I = argmax {p{l \ y)} — argmin{J7(£)} (54) 
i i 

with 

J 7 (l) = -lnp(l)-\My\l) (55) 

does not need any more integration, but only one summation. Choosing p(l) 
uniform and making the same previous approximations we have 

J 7 (l) = -In f^p(y\k,l)p(k). (56) 

7. Proposed algorithms 

Based on the equations (|55|), (|5^), (|50|), ( |39"| ) and (^) we propose the following 
algorithm: 



Marginal MAP estimation algorithm 2 
Generate a set of samples {4>j} drawn from p(<f>) 
Generate a set of samples {ipi} drawn from p(ip) 

for / — 1 . Imax 

for k — 1 ; k max 

compute the elements of the matrix A; 

for <j> £ {<f)j} 
for ip E {-ipi} 

compute A = <f>/i/j, x — (A A + \I)~ 1 A t y and y = Ax 
compute p^(i,j,k,l) = exp[- J 2 (<j>j,ipi)] (eq. ||) 
end 

normalize p^(i,j,k,l) = p^(i, j, k, I) / Y^iP^{hj> k ,l) 
compute P4,{j,k,l) = Y^iP^ih j, M) 
end 

normalize p^{j,k,l) = p<j,{j,k,l) / J2jP4>ti, k J) 
compute Pk(k,l) = J2jP<j>ti, k J) 
end 

normalize p k {k, I) = pu(k, I) / J2k Pk( k > I) 
compute pi(l) = ^2 k Pk(k,l) 

end 

normalize p t (l) = p t (l) / J2iP( l ) 

choose the best model by I = argmaxj {pi(l)} 

choose the best model order by k = argmaxj. |p/c(fc, Z)| 

choose the best value for cf> = (j>j with j = argmax^- |p0(j, ^, fc) j" 
choose the best value for ijj = with i = argmax^ v2ty(i, j ,1, k)\ 
compute A = (jy/ip 

compute the elements of the matrix A for I — I and k = k 
compute x = (A 1 A + \I)^ l A t y . 



8. Application: Electron scattering data inversion 

Elastic electron scattering provides a mean of determining the charge density of a 
nucleus, p(r), from the experimentally determined charge form factor, F(q). The 
connection between the charge density and the cross section is well understood 
and in plane wave Born approximation F(q) is just the Fourier transform of p(r) 
which for the case of even-even nuclei, which we shall consider, is simply given by 

/>oo 

F(q)=4ir r 2 J (qr)p(r)dr (57) 



where Jo is the spherical Bessel function of zero order and q is the absolute value 
of the three momentum transfer. 

We applied the proposed method with the following usual discretization pro- 



cedure: 

, (r) = {£}=i*i*i(r) (58) 

which results in 

F(g) = 4 7 r^x j f ° r 2 J {qr)bj{r)Ar (59) 

and 

y = Ax + e (60) 

where x is a vector containing the coefficients {xj,j = 1, • • • , k}, y is a vector 
containing the form factor data {F(qi), i — 1, • • ■ , m) and A an (m x fc) matrix 
containing the coefficients Ajj given by 

/•Ac 

Aij=4Tr r 2 J (q t r)b 3 (r)dr. (61) 
Jo 

To compute j4,j we define a discretization step Ar = R c /N, a vector r = 
{r„ = (n — l)Ar, n = 1, • • ■ , iV}, a, (N x k) matrix £? with elements -B njJ - = bj(r n ), 
a (m x iV) matrix C with elements Ci jrl = (47rAr)r^ Jo(qir n ) such that we have 
A = C.B. Note also that when the vector x is determined, we can compute 
P = {p{ r n), n = 1, • • • , N} by p = Bx. 

To test the proposed methods, we used the following simulation procedure: 

— Select a model type I and an order k and generate the the matrixes B, C and 
A, and for a random set of parameters x generate the data y = Ax. 

— Add some noise e on y to^obtain y — Ax + e; 

— Compute the estimates I, k, x, y = Ax and p = Bx and compare them with 
I, k, x, y = Ax and p = Bx. 

We choosed the following basis functions: 

— 1 = 1: bj(r) = Jo(qjr) — This is a natural choice due to the integral kernel 
and the orthogonality property of the bessel functions. 

— 1 = 2: bj(r) = sinc(<7jr) — This choice is also natural due to the orthogo- 
nality and the limited support hypothesis for the function p(r). 

— 1 = 3: bj(r) = cxp [— ^(<7j?") 2 ] — This choice can account for the positivity 
of the function p(r) if {xj} are constrained to be positive. 

— 1 = 4: bj(r) = cxp [— \{qjr) 21 \ Jo(qjr) — This choice combines the first and 
the third properties. 

— 1 = 5: bj(r) = 1/ (cosher)) — This choice has the same properties of the 
third one. 

— 1 = 6: bj(r) = 1/(1 + (qjr) 2 ) — This choice has the same properties of the 
third one. 

In all these experiments we choosed k = 6, m = 20, N = 100, R c = 8 and 
qi = iTT/R c . The following figures show typical solutions. Figures 1 and 2 show 
the details of the procedure for the case 1 = 1. Figures 3, 4 and 5 show the results 
for the cases / = 1 to I = 6. 




p(k,l) 



p{l) and p(k\l) 



i=i 

and 

k 



and 



Fig. 2: a) p(k,l\y), p(l\y) and Z), b) original p(r) 

and estimated p(r), c) original F^qi) and estimated F(qi). 




Fig. 3: Left: I = 1 Right: I = 2 

a) basis functions bj (r) ; 

b) p(k,l\y); 

c) p(k\y) and p(k\y, I): 

d) p[r) and p(r); 
c) F( gi ) and F( gi ). 




Fig. 4: Left: I = 3 Right: I = 4 

a) basis functions bj(r); 

b) p(k,l\y); 

c) p(k\y) and p(k\y, I): 

d) p[r) and p(r); 
c) F( gi ) and F( gi ). 




Fig. 5: Left: I = 5 

a) basis functions bj (r) ; 

b) p(k,l\y); 

c) p(k\y) and p(k\y, I): 

d) p[r) and p(r); 
c) F( gi ) and F( gi ). 



Right: I = 6 



Note that in these tests, we know perfectly the model and generated the data 
according to our hypothesis. To test the method to a more realistic case, we chose 
a model for which we can have an exact analytic expression for the integrals. For 
example, if we choose a symmetric Fermi distribution p| 



p(r) 



cosh(R/d) 



cosh(i?/d) + cosh(r/d) 



(62) 



an analytical expression for the corresponding charge form factor can easily be 
obtained ||: 



F(q) = 



4ir 2 ad cosh(R/d) 
q sinh(R/d) 



R cos(qR) Trdsiii(qR) cosh(irqd) 



smh(Trqd) 



sinh (nqd) 



(63) 



Only two of the parameters a, R and d are independent since the charge density 
must fulfill the normalization condition 



47r / r 2 p(r) dr 



(64) 



Figure 6 shows the theoretical charge density p(r) of 12 C (Z=6) obtained 
from (H) for r E [0, 8] fm with R = 1.1 A and d = 0.626 fm and the theoretical 
charge form factor F(q) obtained by ( |63|) for q 6 [0, 8] fm -1 and the 15 simulated 
data: 

q = [0.001, .5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0] faT 1 
which are used as inputs to the inversion method. 

Charge density 



Momentum transfer 



Fig. 6: Theoretical charge density p(r), charge form factor log | ^(g) | and the 
data [stars] used for numerical experiments [right] . 

First note that, even with the exact data, there are an infinite number of 
solutions which fits exactly the data. The following figure shows a few set of these 
solutions. 



model number 



model order 




1 2 3 4 5 6 
model number 





1 23456789101 11 2131415 
model order 



Fig. 7: &)p(k,l\y); 

b) p(k\y) and p(k\y, I); 

c) p[r) and p(r); d) F(qi) and F(qi). 



9. Conclusions 

We discussed the different steps for a complete resolution of an inverse problem 
and focused on the choice of a basis function selection and the order of the model. 
An algorithm based on Bayesian estimation is proposed and tested on simulated 
data. 
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